Summarine

Cultural evolution creates the statistical structure of language

Abstract

Human language is unique in its structure: language is made up of parts that can be recombined in a productive way. The parts are not given but have to be discovered by learners exposed to unsegmented wholes. Across languages, the frequency distribution of those parts follows a power law. Both statistical properties—having parts and having them follow a particular distribution—facilitate learning, yet their origin is still poorly understood. Where do the parts come from and why do they follow a particular frequency distribution? Here, we show how these two core properties emerge from the process of cultural evolution with whole-to-part learning. We use an experimental analog of cultural transmission in which participants copy sets of non-linguistic sequences produced by a previous participant: This design allows us to ask if parts will emerge purely under pressure for the system to be learnable, even without meanings to convey. We show that parts emerge from initially unsegmented sequences, that their distribution becomes closer to a power law over generations, and, importantly, that these properties make the sets of sequences more learnable. We argue that these two core statistical properties of language emerge culturally both as a cause and effect of greater learnability.

p. 1

Introduction

Fundamental properties of language

language segmentation

  • segmented into smaller parts → combined sequentially

skewed frequency distribution

  • power law distribution

↳ make language learnable

  • arise from cultural transmission → repeatedly learnt by multiple generations

fundamental challenge of language acquisition

  • discovery of relevant parts of language
  • in spoken language: no clear word boundaries

Segmentation

How segmentation works

statistical regularities

  • serve as cues for word boundaries in speech

transitional probabilities

  • which syllable is likely to follow another?
  • can cue word boundary

The transitional probabilities within these units will be higher than the transitional probabilities across unit boundaries.

Take the sequence pretty baby as an example, there are many different words that can appear after pretty (e.g., car, boy, hat, cat, and many more) but there are only few sounds that can appear after pre and result in a possible English sequence (premature, precise, and some more). This makes the transitional probability of syllables within a word (of the syllable ty given pre in our example) higher than that of syllables across word boundaries (ba given ty). A wealth of experimental evidence shows that infants, children and adults can track these transitional probabilities and use them to segment a novel continuous speech stream into its constituent parts (see 5 for a review).

p. 2

Origins of segmentation

statistical learning literature

  • focusses on segmentation in languages that already have parts


How does language end up being constructed of parts whose combination makes such a learning strategy effective?


easy answer

  • language is made up of parts because of the nature of the meanings we want to convey with language, which are themselves made up of parts
  • ⇒ “because the world has structure”

this study

  • language has structure, regardless of meaning

Power law

What is the power law?

frequency distribution

  • not all words occur with equal frequency → exponential!
  • ⇒ power law distribution (Zipfian)

Advantage

facilitative for learning

  • performance in word segmentation improves when power law distribution
  • ⇒ cognitive preference for such distributions


Why does this facilitative distribution arise in language?


Learnability

learnability

  • both cause and effect of two fundamental properties of language:
    1. having parts
    2. having power law distribution

circular?

  • no/yes → product of one generation’s learning in the target for the next
  • ⇒ iterated learning over multiple generations leads to languages that are optimised for learning biases inherent in transmitting individuals

Research questions and hypotheses

whole to part learning

  • start from single words and multiword sequences

distributional properties of language can, and will emerge through the process of cultural transmission

p. 3

↓ consequences of whole-to-part learning and iterated learning

1. gradual emergence of sub-parts

  • higher transitional probabilities within units ⟷ across units

2. increasingly skewed distribution

  • power-law after a while

Methodology

experimental setting

  • learn language of colour sequence, then reproduce it later
  • but: task frames as copying a whole, but copying is difficult
  • ⇒ will iterated learning biases take over?

Methods

Simon game

Results

p. 4
p. 5
p. 6
p. 7

Statistical structure emerges

p. 8

Learnability as both cause and effect

p. 9

Discussion

ote that we deliberately designed our experiment to be a non-linguistic task. The memory game is one that appears unrelated to language. The sequences have no similarity to language, and the task was not framed as one which required participants to learn about properties of a set of behaviours. Moreover, both the whole sequences and the emerging parts do not have meaning. Nevertheless, system-wide statistical structure emerges that is strikingly similar to that found in languages. This suggests that structure in language may similarly arise via cultural evolution through a highly general process of iterated sequence learning, rather than learning biases that are specifically adapted to language.